Faster External Memory LCP Array Construction
نویسندگان
چکیده
The suffix array, perhaps the most important data structure in modern string processing, needs to be augmented with the longest-common-prefix (LCP) array in many applications. Their construction is often a major bottleneck especially when the data is too big for internal memory. We describe two new algorithms for computing the LCP array from the suffix array in external memory. Experiments demonstrate that the new algorithms are about a factor of two faster than the fastest previous algorithm. 1998 ACM Subject Classification E.1 Data Structures, F.2.2 Nonnumerical Algorithms and Problems
منابع مشابه
Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array
The longest common prefix (LCP) array is a versatile auxiliary data structure in indexed string matching. It can be used to speed up searching using the suffix array (SA) and provides an implicit representation of the topology of an underlying suffix tree. The LCP array of a string of length n can be represented as an array of length n words, or, in the presence of the SA, as a bit vector of 2n...
متن کاملSpace-Time Tradeoffs for Longest-Common-Prefix Array Computation
The suffix array, a space efficient alternative to the suffix tree, is an important data structure for string processing, enabling efficient and often optimal algorithms for pattern matching, data compression, repeat finding and many problems arising in computational biology. An essential augmentation to the suffix array for many of these tasks is the Longest Common Prefix (LCP) array. In parti...
متن کاملEngineering External Memory LCP Array Construction: Parallel, In-Place and Large Alphabet
The suffix array augmented with the LCP array is perhaps the most important data structure in modern string processing. There has been a lot of recent research activity on constructing these arrays in external memory. In this paper, we engineer the two fastest LCP array construction algorithms (ESA 2016) and improve them in three ways. First, we speed up the algorithms by up to a factor of two ...
متن کاملOptimal Time and Space Construction of Suffix Arrays and LCP Arrays for Integer Alphabets
Suffix arrays and LCP arrays are one of the most fundamental data structures widely used for various kinds of string processing. Many problems can be solved efficiently by using suffix arrays, or a pair of suffix arrays and LCP arrays. In this paper, we consider two problems for a string of length N , the characters of which are represented as integers in [1, . . . , σ] for 1 ≤ σ ≤ N ; the stri...
متن کاملDismantling DivSufSort
We give the first concise description of the fastest known suffix sorting algorithm in main memory, the DivSufSort by Yuta Mori. We then present an extension that also computes the LCP-array, which is competitive with the fastest known LCParray construction algorithm.
متن کامل